{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# 2020语言与智能技术竞赛：事件抽取任务--方案分享(Test1:Rank17 Test2:Rank18)\n",
    "\n",
    "# 本模型在官方PaddleHub版本Baseline上进行修改得到\n",
    "\n",
    "官方原版地址:https://github.com/PaddlePaddle/Research/tree/master/KG/DuEE_baseline/DuEE-PaddleHub\n",
    "\n",
    "本方案github地址:https://github.com/onewaymyway/DuEE_2020\n",
    "\n",
    "# 本方案在官方baseline的基础上的改动\n",
    "\n",
    "1.在网络结构上在CRF层前面增加了双向GRU层（代码见sequence_label.py中SequenceLabelTaskSP类）\n",
    "\n",
    "2.将trigger预测结果拼接到text前面进行第二阶段的role预测(代码见data_process.py的data_process函数中model=role1的情况)，这个改动可以解决同一个句子不同event之间role重叠的问题\n",
    "\n",
    "3.在训练上，本方案先只用train进行训练，然后再将dev放入train进行最后的训练\n",
    "\n",
    "4.增加了简单的最终结果剔除机制(代码见datachecker.py)\n",
    "\n",
    "\n",
    "## 注意\n",
    "\n",
    "本项目代码需要使用GPU环境来运行:\n",
    "\n",
    "<img src=\"https://ai-studio-static-online.cdn.bcebos.com/767f625548714f03b105b6ccb3aa78df9080e38d329e445380f505ddec6c7042\" width=\"40%\" height=\"40%\">\n",
    "<br>\n",
    "<br>\n",
    "并且检查相关参数设置, 例如use_gpu, fluid.CUDAPlace(0)等处是否设置正确. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# 事件抽取任务\n",
    "\n",
    "事件抽取任务的目标是通过给定目标事件类型和角色类型集合及句子，识别句子中所有目标事件类型的事件，并根据论元角色集合抽取事件所对应的论元。其中目标事件类型（event_type）和论元角色（role）限定了抽取的范围，例如：（event_type：胜负，role：时间，胜者，败者，赛事名称）、（event_type：夺冠，role：夺冠事件，夺冠赛事，冠军）。最终将抽取的所有事件论元按如下形式进行输出: {“id”: “id1”, “event_list”: [{“event_type”:“T1”, “arguments”: [{“role”:“R1”, “argument”:“A1”},…]}, {“event_type”:“T2”, “arguments”: [{“role”:“R2”, “argument”:“A2”},…]}]}，比赛将对参赛者最终输出的论元列表进行评估。\n",
    "\n",
    "# 方案\n",
    "本方案分两个阶段，第一个阶段为事件类型抽取，第二阶段为事件论元抽取，两个阶段的模型都是 预训练模型+GRU+CRF进行序列标注，接下来用一个例子来大概讲讲流程。\n",
    "\n",
    "假设当前要预测的句子为：\n",
    "\n",
    "历经4小时51分钟的体力、意志力鏖战，北京时间9月9日上午纳达尔在亚瑟·阿什球场，以7比5、6比3、5比7、4比6和6比4击败赛会5号种子俄罗斯球员梅德韦杰夫，夺得了2019年美国网球公开赛男单冠军。\n",
    "\n",
    "## 事件类型抽取\n",
    "当输入例子句子之后，事件类型阶段的输出结果类似于是\n",
    "\n",
    "历经4小时51分钟的体力、意志力鏖战，北京时间9月9日上午纳达尔在亚瑟·阿什球场，以7比5、6比3、5比7、4比6和6比4击(B-竞赛行为-胜负)败(I-竞赛行为-胜负)赛会5号种子俄罗斯球员梅德韦杰夫，夺得了2019年美国网球公开赛男单冠(B-竞赛行为-夺冠)军(I-竞赛行为-夺冠)。\n",
    "\n",
    "(击,B-竞赛行为-胜负)(败,I-竞赛行为-胜负)\n",
    "\n",
    "(冠,B-竞赛行为-夺冠)(军,I-竞赛行为-夺冠)\n",
    "\n",
    "其它字符的标注为O\n",
    "\n",
    "然后就得到了这个句子的两个事件类型[竞赛行为-胜负,竞赛行为-夺冠]\n",
    "\n",
    "结果文件存在data1/test1.json.trigger.pred文件中\n",
    "\n",
    "## 论元抽取\n",
    "\n",
    "与官方baseline不同，本模型方案将事件拼接到句子前面进行论元抽取，比如刚才的例子，第二阶段将分别进行两次不同的预测\n",
    "\n",
    "1.竞赛行为-胜负：历经4小时51分钟的体力、意志力鏖战，北京时间9月9日上午纳达尔在亚瑟·阿什球场，以7比5、6比3、5比7、4比6和6比4击败赛会5号种子俄罗斯球员梅德韦杰夫，夺得了2019年美国网球公开赛男单冠军。\n",
    "\n",
    "输出结果：\n",
    "\n",
    "竞赛行为-胜负：历经4小时51分钟的体力、意志力鏖战，北(B-时间)京(I-时间)时(I-时间)间(I-时间)9(I-时间)月(I-时间)9(I-时间)日(I-时间)上(I-时间)午(I-时间)纳(B-胜者)达(I-胜者)尔(I-胜者)在亚瑟·阿什球场，以7比5、6比3、5比7、4比6和6比4击败赛会5(B-败者)号(I-败者)种(I-败者)子(I-败者)俄(I-败者)罗(I-败者)斯(I-败者)球(I-败者)员(I-败者)梅(I-败者)德(I-败者)韦(I-败者)杰(I-败者)夫(I-败者)，夺得了2(B-赛事名称)0(I-赛事名称)1(I-赛事名称)9(I-赛事名称)年(I-赛事名称)美(I-赛事名称)国(I-赛事名称)网(I-赛事名称)球(I-赛事名称)公(I-赛事名称)开(I-赛事名称)赛(I-赛事名称)男单冠军。\n",
    "\n",
    "2.竞赛行为-夺冠：历经4小时51分钟的体力、意志力鏖战，北京时间9月9日上午纳达尔在亚瑟·阿什球场，以7比5、6比3、5比7、4比6和6比4击败赛会5号种子俄罗斯球员梅德韦杰夫，夺得了2019年美国网球公开赛男单冠军。\n",
    "\n",
    "输出结果：\n",
    "\n",
    "竞赛行为-夺冠：历经4小时51分钟的体力、意志力鏖战，北(B-时间)京(I-时间)时(I-时间)间(I-时间)9(I-时间)月(I-时间)9(I-时间)日(I-时间)上(I-时间)午(I-时间)纳(B-冠军)达(I-冠军)尔(I-冠军)在亚瑟·阿什球场，以7比5、6比3、5比7、4比6和6比4击败赛会5号种子俄罗斯球员梅德韦杰夫，夺得了(B-夺冠赛事)2(I-夺冠赛事)0(I-夺冠赛事)1(I-夺冠赛事)9(I-夺冠赛事)年(I-夺冠赛事)美(I-夺冠赛事)国(I-夺冠赛事)网(I-夺冠赛事)球(I-夺冠赛事)公(I-夺冠赛事)开(I-夺冠赛事)赛(I-夺冠赛事)男单冠军。\n",
    "\n",
    "训练的时候也是将同一个句子的不同事件拆开转换成不同的句子进行训练，训练的时候一个事件里只有这个事件的论元，忽略这个句子里其它事件的论元。\n",
    "\n",
    "## 生成最终结果\n",
    "\n",
    "最终结果根据论元抽取结果生成，将相同句子的不同事件合到同一个结果里。\n",
    "\n",
    "```\n",
    "{\n",
    "    \"id\":\"6a10824fe9c7b2aa776aa7e3de35d45d\",\n",
    "    \"event_list\":[\n",
    "        {\n",
    "            \"event_type\":\"竞赛行为-胜负\",\n",
    "            \"arguments\":[\n",
    "                {\n",
    "                    \"role\":\"时间\",\n",
    "                    \"argument\":\"北京时间9月9日上午\"\n",
    "                },\n",
    "                {\n",
    "                    \"role\":\"胜者\",\n",
    "                    \"argument\":\"纳达尔\"\n",
    "                },\n",
    "                {\n",
    "                    \"role\":\"败者\",\n",
    "                    \"argument\":\"5号种子俄罗斯球员梅德韦杰夫\"\n",
    "                },\n",
    "                {\n",
    "                    \"role\":\"赛事名称\",\n",
    "                    \"argument\":\"2019年美国网球公开赛\"\n",
    "                }\n",
    "            ]\n",
    "        },\n",
    "        {\n",
    "            \"event_type\":\"竞赛行为-夺冠\",\n",
    "            \"arguments\":[\n",
    "                {\n",
    "                    \"role\":\"时间\",\n",
    "                    \"argument\":\"北京时间9月9日上午\"\n",
    "                },\n",
    "                {\n",
    "                    \"role\":\"夺冠赛事\",\n",
    "                    \"argument\":\"2019年美国网球公开赛\"\n",
    "                },\n",
    "                {\n",
    "                    \"role\":\"冠军\",\n",
    "                    \"argument\":\"纳达尔\"\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# 题外话\n",
    "本方案是最终参赛的方案，在此之外还有一个魔改了PaddleHub的方案，但是效果并没有这个好，但是那个方案可以作为一个魔改PaddleHub的好案例，有时间把那个方案也分享一下。基本思路和这个有点相似，也是将事件信息加到第二阶段的论元抽取，不同的是，这个方案是把文字直接加到了句子前面，另一个方案是把事件做了onehot编码加到了每个词向量上，还将分词信息也加到了词向量上。和这个方案相比，另一个方案实现了一套往PaddleHub模型中的词向量追加Feature的机制（其实我没想明白为啥那个方案效果没这个方案好：（）。\n",
    "\n",
    "本方案排名并不靠前，分享的目的还是希望可以抛砖引玉，希望也能看到前排的大佬们分享一下方案（虽然赛后群里有大佬大致分享了，但是还是想看更详细的分享：））。\n",
    "\n",
    "感觉本次参加比赛的收获挺大的，大概有以下几点：\n",
    "\n",
    "1.大致熟悉了PaddleHub这个框架\n",
    "\n",
    "2.认识了好多大佬，学习到了好多新的知识点\n",
    "\n",
    "3.发现了竟然还有AIStudio这么一个可以白嫖GPU的好地方（因为没GPU其实这是我第一次玩深度学习打比赛）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# 如何在AIStudio把这个方案跑起来\n",
    "\n",
    "fork这个项目，然后一直往后执行就可以了 : )\n",
    "\n",
    "PS.\n",
    "\n",
    "为了训练方便，本项目接了VisualDL,所以在训练的时候打开VisualDL的页面是可以看到统计信息的\n",
    "\n",
    "比如:\n",
    "![](https://ai-studio-static-online.cdn.bcebos.com/a03481088c6040bb9adb0036d08e41c9cbeb04648dde49f4aa7a74b15692bbd8)\n",
    "\n",
    "## 如何打开VisualDL\n",
    "\n",
    "# Notebooks项目访问URL\n",
    "\n",
    "比如你的notebook网页地址为\n",
    "url_notebook = 'http://aistudio.baidu.com/user/30799/33852/notebooks/33852.ipynb?redirects=1'\n",
    "\n",
    "# 替换后visualdl访问URL\n",
    "url_visualdl = 'http://aistudio.baidu.com/user/30799/33852/visualdl'\n",
    "\n",
    "访问后面那个网址就能打开VisualDL了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#从github拉代码\r\n",
    "#如果是fork的这个项目，这一步就不用执行了，如果是自己新开的项目可以执行这一步\r\n",
    "#!svn checkout https://github.com/PaddlePaddle/Research.git/trunk/KG/DuEE_baseline/DuEE-PaddleHub/ ./baseline\r\n",
    "!svn checkout https://github.com/onewaymyway/DuEE_2020.git/trunk ./baseline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/aistudio/baseline\n"
     ]
    }
   ],
   "source": [
    "# 切换到代码目录\r\n",
    "%cd baseline/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#安装依赖\r\n",
    "!pip install -r ./requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#这个代码不用执行！！！\r\n",
    "#打包文件夹下的文件 用于本地备份 排除数据目录和模型目录\r\n",
    "\r\n",
    "!zip -r mb.zip * -x \"./models/*\" -x \"./model/*\" -x \"./data/*\" -x \"./data1/*\" -x \"./orzdata/*\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#拉取最新的比赛数据\r\n",
    "\r\n",
    "!rm -rf ./orzdata\r\n",
    "!mkdir ./orzdata\r\n",
    "!rm -rf ./data\r\n",
    "!mkdir ./data\r\n",
    "%cd ./orzdata\r\n",
    "\r\n",
    "#训练数据 dev数据 test1\r\n",
    "!wget -O train.zip https://dataset-bj.cdn.bcebos.com/event_extraction/train_data.json.zip\r\n",
    "!unzip train.zip\r\n",
    "!wget -O dev.zip https://dataset-bj.cdn.bcebos.com/event_extraction/dev_data.json.zip\r\n",
    "!unzip dev.zip\r\n",
    "!wget -O test.zip https://dataset-bj.cdn.bcebos.com/event_extraction/test1_data.json.zip\r\n",
    "!unzip test.zip\r\n",
    "\r\n",
    "!cp -r ./dev_data/dev.json ../data/dev.json\r\n",
    "!cp -r ./train_data/train.json ../data/train.json\r\n",
    "!cp -r ./test1_data/test1.json ../data/test1.json\r\n",
    "\r\n",
    "#schema数据\r\n",
    "!wget -O schema.zip https://ai.baidu.com/file/9C92719AF96D4DDB96477BFBE1435262\r\n",
    "!unzip schema.zip\r\n",
    "\r\n",
    "!cp -r ./event_schema/event_schema.json ../data/event_schema.json\r\n",
    "\r\n",
    "#test2\r\n",
    "!wget -O test2.zip https://dataset-bj.cdn.bcebos.com/lic2020/test2_data.json.zip\r\n",
    "!unzip test2.zip\r\n",
    "!cp -r ./test2.json ../data/test2.json\r\n",
    "\r\n",
    "%cd ../"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#将数据拷到data1防止做实验把数据覆盖了\r\n",
    "!cp -r ./data ./data1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#!/usr/bin/env python\r\n",
    "# -*- coding: utf-8 -*-\r\n",
    "\r\n",
    "#为了做实验方便封装的脚本调用\r\n",
    "\r\n",
    "import os\r\n",
    "import subprocess\r\n",
    "\r\n",
    "\r\n",
    "def executeSh(sh):\r\n",
    "    '''\r\n",
    "    执行sh命令的封装\r\n",
    "    '''\r\n",
    "    print(\"sh:\",sh)\r\n",
    "    v=os.popen(sh)\r\n",
    "    lines=v.readlines()\r\n",
    "    for line in lines:\r\n",
    "        print(line)\r\n",
    "        #print(line.decode(\"utf-8\"))\r\n",
    "        #print(line.decode(\"gb2312\"))\r\n",
    "    v.close()\r\n",
    "def executeShs(shlist):\r\n",
    "    '''\r\n",
    "    执行多个sh命令的封装\r\n",
    "    '''\r\n",
    "    for sh in shlist:\r\n",
    "        executeSh(sh)\r\n",
    "    \r\n",
    "def savebestToTemp(modelpath):\r\n",
    "    '''\r\n",
    "    将modelpath/best_model复制到modelpath/curtest\r\n",
    "    主要用于保存和测试当前的best_model,防止best_model被覆盖无法重现测试的结果\r\n",
    "    '''\r\n",
    "    tar=modelpath+\"/curtest\"\r\n",
    "    src=modelpath+\"/best_model/.\"\r\n",
    "    sh=\"rm -rf \"+tar\r\n",
    "    executeSh(sh)\r\n",
    "    sh=\"mkdir \"+tar\r\n",
    "    executeSh(sh)\r\n",
    "    sh=\"cp -r \"+src+\" \"+tar\r\n",
    "    executeSh(sh)\r\n",
    "\r\n",
    "def checkPredFile(filePath):\r\n",
    "    '''\r\n",
    "    删除结果中不合理的元素\r\n",
    "    '''\r\n",
    "    executeSh(\"python datachecker.py --targetFile \"+filePath)\r\n",
    "\r\n",
    "def run_gru_model(triggermodel,testFile,saveFile):\r\n",
    "    '''\r\n",
    "    一次性跑完从trigger预测到role预测再到输出最终结果\r\n",
    "    triggermodel:使用的trigger模型目录\r\n",
    "    testFile:要预测的文件\r\n",
    "    saveFile:最终输出文件\r\n",
    "    '''\r\n",
    "    shs=[\r\n",
    "        \"sh run_trigger_gru_predict_withmodel.sh 0 ./data1/ models/trigger_gru  {0} {1}\".format(testFile,triggermodel),\r\n",
    "        \"python data_process.py --action predict_data_process2 --trigger_file data1/{0}.trigger.pred  --schema_file data/event_schema.json --save_path data1/{0}_triggerpred.json\".format(testFile,triggermodel),\r\n",
    "        \"sh run_role1_gru_predict.sh 0 ./data1/ models/rolegru {0}_triggerpred.json\".format(testFile,triggermodel),\r\n",
    "        \"python data_process.py --action predict_data_process3 --role_file data1/{0}_triggerpred.json.role1.pred --schema_file data1/event_schema.json --save_path data1/{2}\".format(testFile,triggermodel,saveFile)\r\n",
    "    ]\r\n",
    "    #print(shs)\r\n",
    "    executeShs(shs)\r\n",
    "\r\n",
    "def run_gru_modelWithOutmark(triggermodel,testFile,outmark=\"\"):\r\n",
    "    '''\r\n",
    "    生成带mark的预测文件\r\n",
    "    主要用于模型之间的融合\r\n",
    "    仅用于实验，比赛最终结果没用模型融合\r\n",
    "    '''\r\n",
    "    shs=[\r\n",
    "        \"sh run_trigger_gru_predict_withmodel.sh 0 ./data1/ models/trigger_gru  {0} {1}\".format(testFile,triggermodel),\r\n",
    "        \"python data_process.py --action predict_data_process2 --trigger_file data1/{0}.trigger.pred  --schema_file data/event_schema.json --save_path data1/{0}{2}_triggerpred.json\".format(testFile,triggermodel,outmark)\r\n",
    "    ]\r\n",
    "    #print(shs)\r\n",
    "    executeShs(shs)  \r\n",
    "\r\n",
    "def runRolePredict(triggerfile=\"test1_triggerpred.json\",rolemodel=\"models/rolegru\",gru=False,predictmodel=\"best_model\",outmark=\"\",finalFile=None,ifCheck=False,data_dir=\"./data1\"):\r\n",
    "    '''\r\n",
    "    根据trigger预测结果进行role预测并生成最终的结果\r\n",
    "    '''\r\n",
    "    sh=\"python sequence_label.py --num_epoch 30 \\\r\n",
    "    --learning_rate 3e-5 \\\r\n",
    "    --data_dir {data_dir} \\\r\n",
    "    --schema_path {data_dir}/event_schema.json \\\r\n",
    "    --train_data {data_dir}/train.json \\\r\n",
    "    --dev_data {data_dir}/dev.json \\\r\n",
    "    --test_data {data_dir}/dev.json \\\r\n",
    "    --predict_data {data_dir}/{predictfile} \\\r\n",
    "    --do_train False \\\r\n",
    "    --do_predict False \\\r\n",
    "    --do_predict2 True \\\r\n",
    "    --do_model role1 \\\r\n",
    "    --add_gru {gru} \\\r\n",
    "    --predictmodel {predictmodel} \\\r\n",
    "    --max_seq_len 256 \\\r\n",
    "    --batch_size 8 \\\r\n",
    "    --model_save_step 3000 \\\r\n",
    "    --eval_step 200 \\\r\n",
    "    --checkpoint_dir {ckpt_dir}\".format(data_dir=data_dir,predictfile=triggerfile,gru=gru,ckpt_dir=rolemodel,predictmodel=predictmodel)\r\n",
    "    if outmark:\r\n",
    "        sh+=\" --outmark \"+outmark\r\n",
    "    executeSh(sh)\r\n",
    "    if finalFile:\r\n",
    "        makefinalFile((\"{data_dir}/\"+triggerfile+\".role1{outmark}.pred\").format(data_dir=data_dir,outmark=outmark),finalFile)\r\n",
    "        if ifCheck:\r\n",
    "            checkPredFile(finalFile)\r\n",
    "\r\n",
    "def makefinalFile(rolePredFile=\"data1/test1_triggerpred_md.json.role1.pred\",savefile=\"data1/test1fn_pred.json\"):\r\n",
    "    '''\r\n",
    "    根据role预测结果生成最终提交数据\r\n",
    "    '''\r\n",
    "    sh=\"python data_process.py --action predict_data_process3 --role_file {0} --schema_file data1/event_schema.json --save_path {1}\".format(rolePredFile,savefile)\r\n",
    "    #sh=\"sh makefinalpred.sh {0} {1}\".format(rolePredFile,savefile)\r\n",
    "    executeSh(sh)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#为了快速跑流程 代码中的预训练模型改成了ernie_tiny\r\n",
    "#实际比赛用的是chinese-roberta-wwm-ext-large\r\n",
    "#要达到比赛的分数效果可以在sequence_label.py中将预训练模型改为chinese-roberta-wwm-ext-large\r\n",
    "#为了简洁，删掉了很多做实验的代码和脚本\r\n",
    "#训练建议到终端里进行训练，因为在终端训练关了浏览器再进还能看到输出 ： ）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#训练trigger_gru 加了gru层版本trigger识别模型\r\n",
    "!sh run_trigger_gru.sh 0 ./data1/ models/trigger_gru"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#训练trigger_gru_mix 将dev数据也放入训练\r\n",
    "!sh run_trigger_gru_mix.sh 0 ./data1/ models/trigger_gru"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#trigger_gru 预测\r\n",
    "#预测结果存在data1/test1.json.trigger.pred\r\n",
    "!sh run_trigger_gru_predict.sh 0 ./data1/ models/trigger_gru  test1.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#根据trigger_gru预测结果数据生成role预测需要的数据\r\n",
    "#结果文件为data1/test1_triggerpred.json\r\n",
    "#具体代码见data_process.py predict_data_process2函数\r\n",
    "!python data_process.py --action predict_data_process2 --trigger_file data1/test1.json.trigger.pred  --schema_file data/event_schema.json --save_path data1/test1_triggerpred.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#训练role1 gru 训练加了gru层的role识别模型\r\n",
    "!sh run_role1_gru.sh 0 ./data1/ models/rolegru"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#训练role1_gru mix 将dev也加入训练\r\n",
    "!sh run_role1_gru_mix.sh 0 ./data1/ models/rolegru"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#role1_gru预测\r\n",
    "#结果文件为data1/test1_triggerpred.json.role1.pred\r\n",
    "!sh run_role1_gru_predict.sh 0 ./data1/ models/rolegru test1_triggerpred.json\r\n",
    "#role1 生成最终提交结果\r\n",
    "#结果文件为data1/test1fn_pred.json\r\n",
    "!python data_process.py --action predict_data_process3 --role_file data1/test1_triggerpred.json.role1.pred --schema_file data1/event_schema.json --save_path data1/test1fn_pred.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#从结果中删除明显不合理的结果，这一步有一点提升，不做也没大影响\r\n",
    "#结果文件为data1/test1fn_pred_md.json\r\n",
    "checkPredFile(\"./data1/test1fn_pred.json\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#AIStudio更新代码编辑器之后出现编辑sh文件后无法执行sh文件的问题\r\n",
    "#需要将每行的\\r手动去掉才能正常执行\r\n",
    "\r\n",
    "from nlputils import readFile,saveFile\r\n",
    "def adptSHFile(filePath):\r\n",
    "    txt=readFile(filePath)\r\n",
    "    txt=txt.replace(\"\\r\",\"\")\r\n",
    "    saveFile(filePath,txt)\r\n",
    "\r\n",
    "adptSHFile(\"run_role1_gru.sh\")\r\n",
    "adptSHFile(\"run_role1_gru_mix.sh\")\r\n",
    "adptSHFile(\"run_role1_gru_eval.sh\")\r\n",
    "adptSHFile(\"run_role1_gru_predict.sh\")\r\n",
    "adptSHFile(\"run_trigger_gru.sh\")\r\n",
    "adptSHFile(\"run_trigger_gru_predict.sh\")\r\n",
    "adptSHFile(\"run_trigger_gru_mix.sh\")\r\n",
    "adptSHFile(\"run_trigger_gru_predict_withmodel.sh\")\r\n",
    "print(\"ok\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.  <br>\n",
    "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PaddlePaddle 1.7.1 (Python 2.7)",
   "language": "python",
   "name": "py27-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
